75 research outputs found

    Sequence analysis in bioinformatics: methodological and practical aspects

    Get PDF
    2011 - 2012My PhD research activities has focused on the development of new computational methods for biological sequence analyses. To overcome an intrinsic problem to protein sequence analysis, whose aim was to infer homologies in large biological protein databases with short queries, I developed a statistical framework BLAST-based to detect distant homologies conserved in transmembrane domains of different bacterial membrane proteins. Using this framework, transmembrane protein domains of all Salmonella spp. have been screened and more than five thousands of significant homologies have been identified. My results show that the proposed framework detects distant homologies that, because of their conservation in distinct bacterial membrane proteins, could represent ancient signatures about the existence of primeval genetic elements (or mini-genes) coding for short polypeptides that formed, through a primitive assembly process, more complex genes. Further, my statistical framework lays the foundation for new bioinformatics tools to detect homologies domain-oriented, or in other words, the ability to find statistically significant homologies in specific target-domains. The second problem that I faced deals with the analysis of transcripts obtained with RNA-Seq data. I developed a novel computational method that combines transcript borders, obtained from mapped RNA-Seq reads, with sequence features based operon predictions to accurately infer operons in prokaryotic genomes. Since the transcriptome of an organism is dynamic and condition dependent, the RNA-Seq mapped reads are used to determine a set of confirmed or predicted operons and from it specific transcriptomic features are extracted and combined with standard genomic features to train and validate three operon classification models (Random Forests - RFs, Neural Networks – NNs, and Support Vector Machines - SVMs). These classifiers have been exploited to refine the operon map annotated by DOOR, one of the most used database of prokaryotic operons. This method proved that the integration of genomic and transcriptomic features improve the accuracy of operon predictions, and that it is possible to predict the existence of potential new operons. An inherent limitation of using RNA-Seq to improve operon structure predictions is that it can be not applied to genes not expressed under the condition studied. I evaluated my approach on different RNA-Seq based transcriptome profiles of Histophilus somni and Porphyromonas gingivalis. These transcriptome profiles were obtained using the standard RNA-Seq or the strand-specific RNA-Seq method. My experimental results demonstrate that the three classifiers achieved accurate operon maps including reliable predictions of new operons. [edited by author]XI n.s

    Multi-Objective Genetic Algorithm for Multi-View Feature Selection

    Full text link
    Multi-view datasets offer diverse forms of data that can enhance prediction models by providing complementary information. However, the use of multi-view data leads to an increase in high-dimensional data, which poses significant challenges for the prediction models that can lead to poor generalization. Therefore, relevant feature selection from multi-view datasets is important as it not only addresses the poor generalization but also enhances the interpretability of the models. Despite the success of traditional feature selection methods, they have limitations in leveraging intrinsic information across modalities, lacking generalizability, and being tailored to specific classification tasks. We propose a novel genetic algorithm strategy to overcome these limitations of traditional feature selection methods for multi-view data. Our proposed approach, called the multi-view multi-objective feature selection genetic algorithm (MMFS-GA), simultaneously selects the optimal subset of features within a view and between views under a unified framework. The MMFS-GA framework demonstrates superior performance and interpretability for feature selection on multi-view datasets in both binary and multiclass classification tasks. The results of our evaluations on three benchmark datasets, including synthetic and real data, show improvement over the best baseline methods. This work provides a promising solution for multi-view feature selection and opens up new possibilities for further research in multi-view datasets

    Knowledge Generation with Rule Induction in Cancer Omics

    Get PDF
    The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.Peer reviewe

    A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery

    Get PDF
    bbab314Typical clustering analysis for large-scale genomics data combines two unsupervised learning techniques: dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods. Moreover, classic clustering metrics based on group separability tend to favor the DR-CL paradigm, which may increase the risk of identifying less actionable disease subtypes that have ambiguous biological and clinical explanations. Hence, there is a need for developing metrics that assess biological and clinical relevance. To facilitate the systematic analysis of BK-CL methods, we propose a computational protocol for quantitative analysis of clustering results derived from both DR-CL and BK-CL methods. Moreover, we propose a new BK-CL method that combines prior knowledge of disease relevant genes, network diffusion algorithms and gene set enrichment analysis to generate robust pathway-level information. Benchmarking studies were conducted to compare the grouping results from different DR-CL and BK-CL approaches with respect to standard clustering evaluation metrics, concordance with known subtypes, association with clinical outcomes and disease modules in co-expression networks of genes. No single approach dominated every metric, showing the importance multi-objective evaluation in clustering analysis. However, we demonstrated that, on gene expression data sets derived from TCGA samples, the BK-CL approach can find groupings that provide significant prognostic value in both breast and prostate cancers.Peer reviewe

    Multi-omics analysis of ten carbon nanomaterials effects highlights cell type specific patterns of molecular regulation and adaptation

    Get PDF
    New strategies to characterize the effects of engineered nanomaterials (ENMs) based on omics technologies are emerging. However, given the intricate interplay of multiple regulatory layers, the study of a single molecular species in exposed biological systems might not allow the needed granularity to successfully identify the pathways of toxicity (PoT) and, hence, portraying adverse outcome pathways (AOPs). Moreover, the intrinsic diversity of different cell types composing the exposed organs and tissues in living organisms poses a problem when transferring in vivo experimentation into cell-based in vitro systems. To overcome these limitations, we have profiled genome-wide DNA methylation, mRNA and microRNA expression in three human cell lines representative of relevant cell types of the respiratory system, A549, BEAS-2B and THP-1, exposed to a low dose of ten carbon nanomaterials (CNMs) for 48 h. We applied advanced data integration and modelling techniques in order to build comprehensive regulatory and functional maps of the CNM effects in each cell type. We observed that different cell types respond differently to the same CNM exposure even at concentrations exerting similar phenotypic effects. Furthermore, we linked patterns of genomic and epigenomic regulation to intrinsic properties of CNM. Interestingly, DNA methylation and microRNA expression only partially explain the mechanism of action (MOA) of CNMs. Taken together, our results strongly support the implementation of approaches based on multi-omics screenings on multiple tissues/cell types, along with systems biology-based multi-variate data modelling, in order to build more accurate AOPs.Peer reviewe

    Supervised Methods for Biomarker Detection from Microarray Experiments

    Get PDF
    Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.Non peer reviewe

    INSIdE NANO : a systems biology framework to contextualize the mechanism-of-action of engineered nanomaterials

    Get PDF
    Engineered nanomaterials (ENMs) are widely present in our daily lives. Despite the efforts to characterize their mechanism of action in multiple species, their possible implications in human pathologies are still not fully understood. Here we performed an integrated analysis of the effects of ENMs on human health by contextualizing their transcriptional mechanism-of-action with respect to drugs, chemicals and diseases. We built a network of interactions of over 3,000 biological entities and developed a novel computational tool, INSIdE NANO, to infer new knowledge about ENM behavior. We highlight striking association of metal and metal-oxide nanoparticles and major neurodegenerative disorders. Our novel strategy opens possibilities to achieve fast and accurate read-across evaluation of ENMs and other chemicals based on their biosignatures.Peer reviewe

    Silver, titanium dioxide, and zinc oxide nanoparticles trigger miRNA/isomiR expression changes in THP-1 cells that are proportional to their health hazard potential

    Get PDF
    After over a decade of nanosafety research, it is indisputable that the vast majority of nano-sized particles induce a plethora of adverse cellular responses - the severity of which is linked to the material's physicochemical properties. Differentiated THP-1 cells were previously exposed for 6 h and 24 h to silver, titanium dioxide, and zinc oxide nanoparticles at the maximum molar concentration at which no more than 15% cellular cytotoxicity was observed. All three nanoparticles differed in extent of induction of biological pathways corresponding to immune response signaling and metal ion homeostasis. In this study, we integrated gene and miRNA expression profiles from the same cells to propose miRNA biomarkers of adverse exposure to metal-based nanoparticles. We employed RNA sequencing together with a quantitative strategy that also enables analysis of the overlooked repertoire of length and sequence miRNA variants called isomiRs. Whilst only modest changes in expression were observed within the first 6 h of exposure, the miRNA/isomiR (miR) profiles of each nanoparticle were unique. Via canonical correlation and pathway enrichment analyses, we identified a co-regulated miR-mRNA cluster, predicted to be highly relevant for cellular response to metal ion homeostasis. These miRs were annotated to be canonical or variant isoforms of hsa-miR-142-5p, -342-3p, -5100, -6087, -6894-3p, and -7704. Hsa-miR-5100 was differentially expressed in response to each nanoparticle in both the 6 h and 24 h exposures. Taken together, this co-regulated miR-mRNA cluster could represent potential biomarkers of sub-toxic metal-based nanoparticle exposure.Peer reviewe

    INfORM : Inference of NetwOrk Response Modules

    Get PDF
    The Summary: Detecting and interpreting responsive modules from gene expression data by using network-based approaches is a common but laborious task. It often requires the application of several computational methods implemented in different software packages, forcing biologists to compile complex analytical pipelines. Here we introduce INfORM (Inference of NetwOrk Response Modules), an R shiny application that enables non-expert users to detect, evaluate and select gene modules with high statistical and biological significance. INfORM is a comprehensive tool for the identification of biologically meaningful response modules from consensus gene networks inferred by using multiple algorithms. It is accessible through an intuitive graphical user interface allowing for a level of abstraction from the computational steps.Peer reviewe

    BACA: bubble chArt to compare annotations

    Get PDF
    • …
    corecore